-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update cythonization in CI #4764
Conversation
Note that I expect this to fail for all builds except Python 3.7 as that's the one build in which we are cythonizing things distributed/.github/workflows/tests.yaml Lines 51 to 54 in ea5cf0e
|
Alright, as @jakirkham mentioned in #4760 (comment), our current CI setup isn't testing the cythonized scheduler correctly. I've updated our CI installation step and added a small test to ensure that we're cythonizing the scheduler as expected. |
Yeah that doesn't surprise me. Had tried to get this to work back when GitHub Actions was only used for Windows ( #4326 ), but ran into difficulties. Not having access to Windows locally and the fact we are primarily Cythonizing the Scheduler on Linux, made it hard to prioritize further work to fix CI. Agree it would be good to get this working and it may make more sense to do now as we are making fewer changes to the Scheduler |
Sounds good, thanks for taking a look at this @jakirkham. I'm waiting for |
I think one of the issues that we are likely running into here is the fact that we have both a |
Ah, sorry I missed that you linked to #4326. Would you prefer for us to move back to that PR instead of what I have here? |
Not at all. That was more for context. The linked PR is pretty old and likely needs to be refreshed. Happy to just start here with what you have 🙂 |
Woot! 🎉 Looks like we are now reproducing issue ( #4760 ) on CI
|
So how should we handle incorporating these PRs? Does GitHub Actions have a concept of allowed failures? Or are we ok with having a failure that we clear out with a later PR? |
Hmm good question. I'd like to merge #4761 and then merge |
SGTM. Happy to dig into any additional failures we find. FWIW have usually been running the Scheduler tests locally and sometimes other components (like Client and Worker tests), but not always the full test suite. So it is possible that we encounter some failures that are a bit more off this beaten path. |
Had some lingering changes to add a global variable to the Scheduler file indicating whether it was compiled or not. Cleaned them up and submitted them as PR ( jrbourbeau#2 ) based off this one. This should make it a bit easier to check whether the Scheduler was Cythonized. Also this can be used to mark tests as |
* Add `COMPILED` global variable to scheduler This should make it easy to tell whether the scheduler was compiled with Cython or not. * Simplify Cythonized Scheduler check
It looks like some tests are failing because With cythonization: In [1]: from distributed import Scheduler
In [2]: Scheduler.restart
Out[2]: <cyfunction Scheduler.restart at 0x120db4d40>
In [3]: from distributed.utils import is_coroutine_function
In [4]: is_coroutine_function(Scheduler.restart)
Out[4]: False Without cythonization: In [1]: from distributed import Scheduler
In [2]: Scheduler.restart
Out[2]: <function distributed.scheduler.Scheduler.restart(self, client=None, timeout=3)>
In [3]: from distributed.utils import is_coroutine_function
In [4]: is_coroutine_function(Scheduler.restart)
Out[4]: True Is there a way we can have |
Another class of failures is due to a Traceback (most recent call last):
File "/home/runner/work/distributed/distributed/distributed/protocol/serialize.py", line 329, in serialize
header, frames = dumps(x, context=context) if wants_context else dumps(x)
File "/home/runner/work/distributed/distributed/distributed/protocol/serialize.py", line 55, in pickle_dumps
protocol=context.get("pickle-protocol", None) if context else None,
File "/home/runner/work/distributed/distributed/distributed/protocol/pickle.py", line 60, in dumps
result = cloudpickle.dumps(x, **dump_kwargs)
File "/usr/share/miniconda3/envs/dask-distributed/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/usr/share/miniconda3/envs/dask-distributed/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
return Pickler.dump(self, obj)
_pickle.PicklingError: Can't pickle <cyfunction Scheduler.__init__.<locals>.<lambda> at 0x7f7cd474d390>: attribute lookup lambda5 on distributed.scheduler failed |
Thanks James! 😄 WDYT about marking all of these xfail and merging this PR? I think the issue in these cases is Cython constructing functions for some of these more atypical cases (closures, lambdas, coroutines, etc.) such that they work in C, but that may come with some tradeoffs (like not being able to pickle them). Some of these might be solved with a small rewrite or similar. Some of these may be bugs in Cython. Will take a look and see what we can do for these. |
+1 marking tests right now |
Thanks James! 😄 |
Opened PR ( #4767 ) to start working on fixing the skipped/xfailed tests |
Fixing a couple of pickling issues with PR ( #4768 ) |
Not sure atm, but did find an MRE. Filed as issue ( cython/cython#4138 ) |
I've posted this on the Cython bug but: distributed/distributed/utils.py Line 1264 in 56aed44
I think the consensus for Cython is that the |
Yeah that makes sense. Thanks for the advice 🙂 Was looking through the Think we could use |
Submitted PR ( #4771 ) to switch to the |
This is just a sanity check to see what CI test builds are using the cythonized scheduler. cc @jakirkham @crusaderky
xref #4760 (comment)